Learning Structured Visual Detectors from User Input at Multiple Levels

نویسندگان

Alejandro Jaimes

Shih-Fu Chang

چکیده

In this paper, we propose a new framework for the dynamic construction of structured visual object/scene detectors for content-based retrieval. In the Visual Apprentice, a user defines visual object/scene models via a multiple-level definition hierarchy: a scene consists of objects, which consist of object-parts, which consist of perceptual-areas, which consist of regions. The user trains the system by providing example images/videos and labeling components according to the hierarchy she defines (e.g., image of two people shaking hands contains two faces and a handshake). As the user trains the system, visual features (e.g., color, texture, motion, etc.) are extracted from each example provided, for each node of the hierarchy (defined by the user). Various machine learning algorithms are then applied to the training data, at each node, to learn classifiers. The best classifiers and features are then automatically selected for each node (using cross-validation on the training data). The process yields a Visual Object/Scene Detector (e.g., for a handshake), which consists of an hierarchy of classifiers as it was defined by the user. The Visual Detector classifies new images/videos by first automatically segmenting them, and applying the classifiers according to the hierarchy: regions are classified first, followed by the classification of perceptual-areas, object-parts, and objects. We discuss how the concept of Recurrent Visual Semantics can be used to identify domains in which learning techniques such as the one presented can be applied. We then present experimental results using several hierarchies for classifying images and video shots (e.g., Baseball video, images that contain handshakes, skies, etc.). These results, which show good performance, demonstrate the feasibility, and usefulness of dynamic approaches for constructing structured visual object/scene detectors from user input at multiple levels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Multiple Classifiers In Visual Object Detectors Learned From User Input

There have been many recent efforts in contentbased retrieval to perform automatic classification of images/visual objects. Most approaches, however, have focused on using individual classifiers. In this paper, we study the way in which, in a dynamic framework, multiple classifiers can be combined when applying Visual Object Detectors. We propose a hybrid classifier combination approach, in whi...

متن کامل

بازیابی تعاملی تصاویر طبیعت با بهره گیری از یادگیری چند نمونه ای

Content-based image retrieval (CBIR) has received considerable research interest in the recent years. The basic problem in CBIR is the semantic gap between the high-level image semantics and the low-level image features. Region-based image retrieval and learning from user interaction through relevance feedback are two main approaches to solving this problem. Recently, the research in integra...

متن کامل

The Comparative Effect of Visual vs. Auditory Input Enhancement on Learning Non-Congruent Phrasal Verbs by Iranian EFL Learners

Vocabulary is one of the essential components of language and learning phrasal verbs as part of vocabulary is quite challenging for foreign language learners. The present study aimed at investigating the effects of visual and auditory input enhancement on learning non-congruent phrasal verbs. The participants of the study were 90 intermediate English language learners who were divided into two ...

متن کامل

Learning Person Detectors from Multiple Cameras

Recently, object detection by combining information from multiple cameras has become popular. We adopt these ideas for on-line learning of object detectors from multiple cameras. Assuming that each camera corresponds to a classifier, we extend the original co-training approach to visual learning by exploring geometrical information extracted from the visual input. In this way, new, very valuabl...

متن کامل

The Impact of Learning Styles on the Iranian EFL Learners' Input Processing

This research study explored the impact of learning styles and input modalities on the second language (L2) learners' input processing (IP). This study also sought to appraise the usefulness of Processing Instruction (PI) and its components in relation to the learners' learning styles and input modalities. To this end, 73 male and female Iranian EFL learners from Islamic Azad University, North ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Int. J. Image Graphics

دوره 1 شماره

صفحات -

تاریخ انتشار 2001

Learning Structured Visual Detectors from User Input at Multiple Levels

نویسندگان

چکیده

منابع مشابه

Integrating Multiple Classifiers In Visual Object Detectors Learned From User Input

بازیابی تعاملی تصاویر طبیعت با بهره گیری از یادگیری چند نمونه ای

The Comparative Effect of Visual vs. Auditory Input Enhancement on Learning Non-Congruent Phrasal Verbs by Iranian EFL Learners

Learning Person Detectors from Multiple Cameras

The Impact of Learning Styles on the Iranian EFL Learners' Input Processing

عنوان ژورنال:

اشتراک گذاری